-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ECA-Net Efficient Channel Attention #82
Conversation
implement ECA module by 1. adopting original eca_module.py into models folder 2. adding use_eca layer besides every instance of SE layer
Thanks for the PR, I actually have an initial implementation locally that I'm running an experiment with, but the circular padding is a nice idea so might pull this one in the end. Have there been any comparisons result wise with the circular padding? Also, for mine, I find it much easier to quickly make sense of the shape intent using view vs combining transpose and squeeze/unsqueeze like the original authors did. So my forward is like:
The other design decision I was debating was whether to push a separate |
1.the circular padding 'seems' to have competitive or better results compared the original code. I have not seen code utilizing multiple attention modules (aside of sk+se obviously but SK isn't an attention module in the traditional sense) Sample toy code Se_layer=cbam, se, eca etc... I think it could be a good opportunity to add other easy to implement attention modules too(such as cbam) |
|
functionally similar adjusted rwightman's version of reshaping and viewing. Use F.pad for circular eca version for cleaner code
I rebased my code on the most recent master, cleaned up code with shape intent using view, and in a similar vein, changed my hacky padding into the F.pad implementation. the 'circular'mode of F.pad is not properly documented, but at least it isn't bugged. |
@vrandme great, thanks I have another branch, A few minor things to cleanup in the meantime
The one other thing regarding the decision to have both
|
CamelCased EcaModule. Renamed all instances of ecalayer to EcaModule. eca_module.py->EcaModule.py
Make pylint happy (commas, unused imports, missed imports)
I am very interested in your select_kernel branch and have been following it closely. After your branch is merged into master, and if there is further interest(either by me or you etc) I might look into it again. My work depends on you and your work and not the other way around so take your time. As for my code I made changed all the names but i'm not sure if I changed it the way you'd like. As for the space after commas, I just followed pylint's instructions which also pointed out some import issues(I left out importing functional as F in my pushed code)
I have not done testing myself, but I think it would only make sense to use such modules concurrently and not sequentially. As for the concurrent/parallel use, there would need to be a way to combine their results which is another design decision (addition, softmax etc) that I am not equipped to handle atm. Also I think the combinatoric explosion(order, selection, parallel vs sequential etc) makes it difficult to reason about how to design/test this multiple attention idea. In conclusion |
When channel size is given, calculate adaptive kernel size according to original paper. Otherwise use the given kernel size(k_size), which defaults to 3
I think I am almost done with my pull request. The original github's ResNet seems to implement it the way I did "originally" but the paper is kinda inconsistent when presenting their findings under kernel size 3 vs adaptive. To recap,
This was done for both ECA and CECA |
It is an evolution of BAM which incorporated spatial attention. CBAM adds channel attention on top in a sequential manner. The spatial attention implemented in CBAM is quite simple and actually quite similar to the original ECA, differing mainly in the dimension (channel vs spatial) Maybe it could be combined with ECA sometime? |
@vrandme I've merged this onto a new branch called attention, I'm currently in the progress of merging attention with select_kernel and doing some fairly extensive refactoring that should hopefully leave things making sense at the end :) |
ECA-Net Efficient Channel Attention
This is my initial implementation of ECA-Net's ECA module.
The ECA module is a highly efficient(in terms of FLOPs,throughput,parameter size) way to implement channel attention.
The original paper https://arxiv.org/abs/1910.03151 shows it to be competitive with, or out perform other well known methods such as Squeeze and excite.
This is my initial pull request that first implements the eca_layer and applies it to the base resnet that pytorch-image-model uses.
I also added a test network "ecaresnext26tn_32x4d" which differs from "seresnext26tn_32x4d" only in that it replaces each se layer with an eca_layer.(it also does not have pretrained weights)
I tested this code comparing it with "seresnext26tn_32x4d" so far on MNIST and CIFAR and it looks okay.
This pull also includes my initial implementation of circular ECA
whereas the original ECA for some reason implemented channelwise attention with zero(default) padding. However, since channels do not have inherent ordering, there is no reason the channels on the 'edges' be left out with out neighbors. To allow for each channel to have equal number of neighbors to be convolved with, I manually implemented circular padding.
Although my own testing showed it also to be competitive or better than original ECA, I wanted further review before further integrating it into the codebase.
@rwightman what do you think?
Please feel free to point out errors, style adjustments necessary etc that needs to be dealt with before this code could be pulled.
If further testing is required, please let me know the scope and extent of the testing you'd like to see.
Unfortunately I can manage any ImageNet training/testing only through a google collab account with its 30 hours free GPU time so please keep that in mind.
Thanks.